Overview

Dataset statistics

Number of variables13
Number of observations745
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory75.8 KiB
Average record size in memory104.2 B

Variable types

Numeric6
Categorical6
Boolean1

Alerts

Oldpeak is highly correlated with HeartDiseaseHigh correlation
HeartDisease is highly correlated with OldpeakHigh correlation
HeartDisease is highly correlated with ExerciseAngina and 2 other fieldsHigh correlation
ExerciseAngina is highly correlated with HeartDisease and 1 other fieldsHigh correlation
ST_Slope is highly correlated with HeartDisease and 1 other fieldsHigh correlation
ChestPainType is highly correlated with HeartDiseaseHigh correlation
Unnamed: 0 is highly correlated with RestingECGHigh correlation
ChestPainType is highly correlated with ExerciseAngina and 1 other fieldsHigh correlation
RestingECG is highly correlated with Unnamed: 0High correlation
MaxHR is highly correlated with ExerciseAnginaHigh correlation
ExerciseAngina is highly correlated with ChestPainType and 3 other fieldsHigh correlation
Oldpeak is highly correlated with ExerciseAngina and 2 other fieldsHigh correlation
ST_Slope is highly correlated with OldpeakHigh correlation
HeartDisease is highly correlated with ChestPainType and 2 other fieldsHigh correlation
Unnamed: 0 has unique values Unique
Oldpeak has 317 (42.6%) zeros Zeros

Reproduction

Analysis started2021-10-02 00:38:15.173348
Analysis finished2021-10-02 00:38:24.091571
Duration8.92 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct745
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean475.4442953
Minimum0
Maximum917
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum0
5-th percentile37.2
Q1186
median545
Q3731
95-th percentile879.8
Maximum917
Range917
Interquartile range (IQR)545

Descriptive statistics

Standard deviation290.3283596
Coefficient of variation (CV)0.6106464257
Kurtosis-1.444595285
Mean475.4442953
Median Absolute Deviation (MAD)273
Skewness-0.1654360167
Sum354206
Variance84290.55637
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9171
 
0.1%
2441
 
0.1%
2531
 
0.1%
2521
 
0.1%
2511
 
0.1%
2501
 
0.1%
2491
 
0.1%
2481
 
0.1%
2471
 
0.1%
2461
 
0.1%
Other values (735)735
98.7%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
ValueCountFrequency (%)
9171
0.1%
9161
0.1%
9151
0.1%
9141
0.1%
9131
0.1%
9121
0.1%
9111
0.1%
9101
0.1%
9091
0.1%
9081
0.1%

Age
Real number (ℝ≥0)

Distinct49
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.90604027
Minimum28
Maximum77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum28
5-th percentile37
Q146
median54
Q359
95-th percentile68
Maximum77
Range49
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.489625264
Coefficient of variation (CV)0.1793675205
Kurtosis-0.3819061477
Mean52.90604027
Median Absolute Deviation (MAD)6
Skewness-0.1021927106
Sum39415
Variance90.05298766
MonotonicityNot monotonic
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
5447
 
6.3%
5836
 
4.8%
5535
 
4.7%
5731
 
4.2%
5230
 
4.0%
4829
 
3.9%
5628
 
3.8%
5926
 
3.5%
5126
 
3.5%
6225
 
3.4%
Other values (39)432
58.0%
ValueCountFrequency (%)
281
 
0.1%
293
 
0.4%
301
 
0.1%
312
 
0.3%
324
 
0.5%
332
 
0.3%
346
0.8%
359
1.2%
365
0.7%
3711
1.5%
ValueCountFrequency (%)
772
 
0.3%
762
 
0.3%
753
 
0.4%
745
 
0.7%
723
 
0.4%
715
 
0.7%
705
 
0.7%
699
1.2%
687
0.9%
6713
1.7%

Sex
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
M
563 
F
182 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M563
75.6%
F182
 
24.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m563
75.6%
f182
 
24.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ChestPainType
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
ASY
370 
NAP
168 
ATA
166 
TA
41 

Length

Max length3
Median length3
Mean length2.944966443
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowATA
2nd rowNAP
3rd rowATA
4th rowASY
5th rowNAP

Common Values

ValueCountFrequency (%)
ASY370
49.7%
NAP168
22.6%
ATA166
22.3%
TA41
 
5.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
asy370
49.7%
nap168
22.6%
ata166
22.3%
ta41
 
5.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

RestingBP
Real number (ℝ≥0)

Distinct63
Distinct (%)8.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean133.0362416
Minimum92
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum92
5-th percentile110
Q1120
median130
Q3140
95-th percentile160
Maximum200
Range108
Interquartile range (IQR)20

Descriptive statistics

Standard deviation17.2904513
Coefficient of variation (CV)0.1299679779
Kurtosis0.7391062279
Mean133.0362416
Median Absolute Deviation (MAD)10
Skewness0.6179987524
Sum99112
Variance298.9597063
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120110
14.8%
130102
13.7%
14095
 
12.8%
15048
 
6.4%
11046
 
6.2%
16039
 
5.2%
12520
 
2.7%
13216
 
2.1%
13815
 
2.0%
12814
 
1.9%
Other values (53)240
32.2%
ValueCountFrequency (%)
921
 
0.1%
942
 
0.3%
961
 
0.1%
981
 
0.1%
10011
1.5%
1011
 
0.1%
1022
 
0.3%
1042
 
0.3%
1054
 
0.5%
1063
 
0.4%
ValueCountFrequency (%)
2002
 
0.3%
1921
 
0.1%
1902
 
0.3%
18010
1.3%
1782
 
0.3%
1741
 
0.1%
1722
 
0.3%
17012
1.6%
1651
 
0.1%
1641
 
0.1%

Cholesterol
Real number (ℝ≥0)

Distinct221
Distinct (%)29.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean244.747651
Minimum85
Maximum603
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum85
5-th percentile166
Q1208
median237
Q3275
95-th percentile339.8
Maximum603
Range518
Interquartile range (IQR)67

Descriptive statistics

Standard deviation59.11368852
Coefficient of variation (CV)0.2415291353
Kurtosis4.54423971
Mean244.747651
Median Absolute Deviation (MAD)34
Skewness1.240816716
Sum182337
Variance3494.42817
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25411
 
1.5%
22310
 
1.3%
22010
 
1.3%
2049
 
1.2%
2309
 
1.2%
2119
 
1.2%
2169
 
1.2%
2408
 
1.1%
2198
 
1.1%
2468
 
1.1%
Other values (211)654
87.8%
ValueCountFrequency (%)
851
0.1%
1002
0.3%
1101
0.1%
1131
0.1%
1171
0.1%
1231
0.1%
1262
0.3%
1291
0.1%
1311
0.1%
1321
0.1%
ValueCountFrequency (%)
6031
0.1%
5641
0.1%
5291
0.1%
5181
0.1%
4911
0.1%
4681
0.1%
4661
0.1%
4581
0.1%
4171
0.1%
4121
0.1%

FastingBS
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
620 
1
125 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0620
83.2%
1125
 
16.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0620
83.2%
1125
 
16.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

RestingECG
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Normal
445 
LVH
176 
ST
124 

Length

Max length6
Median length6
Mean length4.625503356
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNormal
2nd rowNormal
3rd rowST
4th rowNormal
5th rowNormal

Common Values

ValueCountFrequency (%)
Normal445
59.7%
LVH176
 
23.6%
ST124
 
16.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
normal445
59.7%
lvh176
 
23.6%
st124
 
16.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

MaxHR
Real number (ℝ≥0)

HIGH CORRELATION

Distinct109
Distinct (%)14.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean140.209396
Minimum69
Maximum202
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum69
5-th percentile98.2
Q1122
median140
Q3160
95-th percentile179
Maximum202
Range133
Interquartile range (IQR)38

Descriptive statistics

Standard deviation24.5361083
Coefficient of variation (CV)0.1749961772
Kurtosis-0.5544883382
Mean140.209396
Median Absolute Deviation (MAD)19
Skewness-0.1632347651
Sum104456
Variance602.0206105
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15039
 
5.2%
14037
 
5.0%
13027
 
3.6%
16024
 
3.2%
12021
 
2.8%
17019
 
2.6%
12517
 
2.3%
11016
 
2.1%
13514
 
1.9%
12214
 
1.9%
Other values (99)517
69.4%
ValueCountFrequency (%)
691
0.1%
711
0.1%
731
0.1%
801
0.1%
821
0.1%
842
0.3%
862
0.3%
871
0.1%
882
0.3%
902
0.3%
ValueCountFrequency (%)
2021
 
0.1%
1951
 
0.1%
1941
 
0.1%
1921
 
0.1%
1902
0.3%
1882
0.3%
1871
 
0.1%
1862
0.3%
1854
0.5%
1844
0.5%

ExerciseAngina
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size873.0 B
False
458 
True
287 
ValueCountFrequency (%)
False458
61.5%
True287
38.5%

Oldpeak
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct42
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9029530201
Minimum0
Maximum6.2
Zeros317
Zeros (%)42.6%
Negative0
Negative (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.5
Q31.5
95-th percentile3
Maximum6.2
Range6.2
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation1.072952854
Coefficient of variation (CV)1.188270962
Kurtosis1.359375243
Mean0.9029530201
Median Absolute Deviation (MAD)0.5
Skewness1.218105519
Sum672.7
Variance1.151227827
MonotonicityNot monotonic
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
0317
42.6%
168
 
9.1%
258
 
7.8%
1.539
 
5.2%
323
 
3.1%
1.222
 
3.0%
0.219
 
2.6%
1.815
 
2.0%
1.415
 
2.0%
0.815
 
2.0%
Other values (32)154
20.7%
ValueCountFrequency (%)
0317
42.6%
0.110
 
1.3%
0.219
 
2.6%
0.39
 
1.2%
0.410
 
1.3%
0.512
 
1.6%
0.614
 
1.9%
0.71
 
0.1%
0.815
 
2.0%
0.93
 
0.4%
ValueCountFrequency (%)
6.21
 
0.1%
5.61
 
0.1%
51
 
0.1%
4.41
 
0.1%
4.22
 
0.3%
48
1.1%
3.81
 
0.1%
3.64
0.5%
3.52
 
0.3%
3.43
 
0.4%

ST_Slope
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Flat
354 
Up
348 
Down
43 

Length

Max length4
Median length4
Mean length3.065771812
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUp
2nd rowFlat
3rd rowUp
4th rowFlat
5th rowUp

Common Values

ValueCountFrequency (%)
Flat354
47.5%
Up348
46.7%
Down43
 
5.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
flat354
47.5%
up348
46.7%
down43
 
5.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

HeartDisease
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
389 
1
356 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0389
52.2%
1356
47.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0389
52.2%
1356
47.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Unnamed: 0AgeSexChestPainTypeRestingBPCholesterolFastingBSRestingECGMaxHRExerciseAnginaOldpeakST_SlopeHeartDisease
0040MATA1402890Normal172N0.0Up0
1149FNAP1601800Normal156N1.0Flat1
2237MATA1302830ST98N0.0Up0
3348FASY1382140Normal108Y1.5Flat1
4454MNAP1501950Normal122N0.0Up0
5539MNAP1203390Normal170N0.0Up0
6645FATA1302370Normal170N0.0Up0
7754MATA1102080Normal142N0.0Up0
8837MASY1402070Normal130Y1.5Flat1
9948FATA1202840Normal120N0.0Up0

Last rows

Unnamed: 0AgeSexChestPainTypeRestingBPCholesterolFastingBSRestingECGMaxHRExerciseAnginaOldpeakST_SlopeHeartDisease
73590863MASY1401870LVH144Y4.0Up1
73690963FASY1241970Normal136Y0.0Flat1
73791041MATA1201570Normal182N0.0Up0
73891159MASY1641761LVH90N1.0Flat1
73991257FASY1402410Normal123Y0.2Flat1
74091345MTA1102640Normal132N1.2Flat1
74191468MASY1441931Normal141N3.4Flat1
74291557MASY1301310Normal115Y1.2Flat1
74391657FATA1302360LVH174N0.0Flat1
74491738MNAP1381750Normal173N0.0Up0